make some calculations 250-500 times faster by Safari77 · Pull Request #372 · printfn/fend

Safari77 · 2025-12-31T10:05:26Z

Before:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de -

real 1m42,334s
user 1m41,908s
sys 0m0,032s

Afer:
$ time ./fend 2^1555.555|b3sum
0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de -

real 0m0,401s
user 0m0,401s
sys 0m0,004s

And for the gcd:

Before:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e -

real    0m19,810s
user    0m19,757s
sys     0m0,005s

After:
$ time ./fend 0.9911^192|b3sum
315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e -

real    0m0,049s
user    0m0,040s

System information: rust 1.92.0, LLVM 20.1.8 x86_64-redhat-linux-gnu, Intel i5-13600K, opt-level=3, ~/.cargo/config.toml contents: [build]
rustflags = ["-C", "target-cpu=native"]

Functions changed:

add_assign_internal: Replaces high-level get/set overhead with efficient, vectorizable slice iteration using .zip() to eliminate bounds checks in the hot loop.

sub: Optimizes borrowing logic using u128 overflow detection and safely clamps loop ranges to handle unnormalized inputs without panicking.

sub_assign: Performs subtraction in-place on mutable buffers to eliminate expensive memory allocations during the Karatsuba recombination step.

mul: Dispatches to Karatsuba multiplication for inputs larger than 64 limbs (4096 bits) to reduce algorithmic complexity from O(N²) to O(N^¹·⁵⁸⁵).

mul_karatsuba_slice: Recursively computes Karatsuba products using &[u64] slices instead of cloning BigUint vectors, significantly reducing memory churn.

mul_internal_slice: Provides a highly optimized Schoolbook (O(N²)) multiplication base case that operates directly on slices for maximum cache efficiency.

div_rem_knuth (called by divmod): Replaces slow bit-by-bit binary division with Knuth's Algorithm D (base-2⁶⁴), reducing the number of division steps by a factor of 64.

root_n: Replaces Binary Search (linear convergence) with Newton's Method (quadratic convergence), reducing iteration count from thousands to dozens for large roots.

lshift / rshift: Moves interrupt checks outside the loop and simplifies carry logic to allow the compiler to generate efficient block memory moves.

add_assign_shifted: A specialized helper for Karatsuba that performs a "shift-and-add" operation in one pass without creating intermediate shifted values.

gcd: Replaces the division-heavy Euclidean algorithm with Stein's Algorithm (Binary GCD),
utilizing efficient bitwise shifts and subtraction to eliminate expensive modulo operations during
fraction simplification.

Before: $ time ./fend 2^1555.555|b3sum 0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de - real 1m42,334s user 1m41,908s sys 0m0,032s Afer: $ time ./fend 2^1555.555|b3sum 0567ac5de698ba87f79145ebf29c1ab169d37b185e76ddd4b8ecc2de2c82d4de - real 0m0,401s user 0m0,401s sys 0m0,004s And for the gcd: Before: $ time ./fend 0.9911^192|b3sum 315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e - real 0m19,810s user 0m19,757s sys 0m0,005s After: $ time ./fend 0.9911^192|b3sum 315e6e5e5b9716413c0e6cbe347ed8a7e2923b3cec6290db7d88ff9710aea09e - real 0m0,049s user 0m0,040s System information: rust 1.92.0, LLVM 20.1.8 x86_64-redhat-linux-gnu, Intel i5-13600K, opt-level=3, ~/.cargo/config.toml contents: [build] rustflags = ["-C", "target-cpu=native"] Functions changed: add_assign_internal: Replaces high-level get/set overhead with efficient, vectorizable slice iteration using .zip() to eliminate bounds checks in the hot loop. sub: Optimizes borrowing logic using u128 overflow detection and safely clamps loop ranges to handle unnormalized inputs without panicking. sub_assign: Performs subtraction in-place on mutable buffers to eliminate expensive memory allocations during the Karatsuba recombination step. mul: Dispatches to Karatsuba multiplication for inputs larger than 64 limbs (4096 bits) to reduce algorithmic complexity from O(N²) to O(N^¹·⁵⁸⁵). mul_karatsuba_slice: Recursively computes Karatsuba products using &[u64] slices instead of cloning BigUint vectors, significantly reducing memory churn. mul_internal_slice: Provides a highly optimized Schoolbook (O(N²)) multiplication base case that operates directly on slices for maximum cache efficiency. div_rem_knuth (called by divmod): Replaces slow bit-by-bit binary division with Knuth's Algorithm D (base-2⁶⁴), reducing the number of division steps by a factor of 64. root_n: Replaces Binary Search (linear convergence) with Newton's Method (quadratic convergence), reducing iteration count from thousands to dozens for large roots. lshift / rshift: Moves interrupt checks outside the loop and simplifies carry logic to allow the compiler to generate efficient block memory moves. add_assign_shifted: A specialized helper for Karatsuba that performs a "shift-and-add" operation in one pass without creating intermediate shifted values. gcd: Replaces the division-heavy Euclidean algorithm with Stein's Algorithm (Binary GCD), utilizing efficient bitwise shifts and subtraction to eliminate expensive modulo operations during fraction simplification.

codecov · 2026-01-01T04:03:41Z

Codecov Report

❌ Patch coverage is 75.47974% with 115 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.17%. Comparing base (9636b16) to head (6a52ead).
⚠️ Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
core/src/num/biguint.rs	75.47%	115 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #372      +/-   ##
==========================================
- Coverage   81.47%   81.17%   -0.30%     
==========================================
  Files          52       52              
  Lines       14717    15032     +315     
==========================================
+ Hits        11990    12202     +212     
- Misses       2727     2830     +103

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

printfn · 2026-01-01T04:07:57Z

Thank you!

Safari77 marked this pull request as draft December 31, 2025 10:36

Safari77 force-pushed the performance branch from 667361e to 7d0d1c0 Compare December 31, 2025 10:44

Safari77 marked this pull request as ready for review December 31, 2025 10:46

Safari77 changed the title ~~make it 250 times faster~~ make some calculations 250-500 times faster Dec 31, 2025

printfn added 2 commits January 1, 2026 03:56

Remove unused code

9f25925

Fix clippy warnings

6a52ead

printfn merged commit 602508d into printfn:main Jan 1, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make some calculations 250-500 times faster#372

make some calculations 250-500 times faster#372
printfn merged 3 commits intoprintfn:mainfrom
Safari77:performance

Safari77 commented Dec 31, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 1, 2026 •

edited

Loading

Uh oh!

Uh oh!

printfn commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Safari77 commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

printfn commented Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Safari77 commented Dec 31, 2025 •

edited

Loading

codecov bot commented Jan 1, 2026 •

edited

Loading